Skip to main content

All Questions

0votes
3answers
70views

Using LLM/AI tools to identify entity types

I am working with a data that has a list of organization names, but the "type" of the organization is not given. What I mean by type is that I know that organizations within my list can fall ...
Kuantew's user avatar
0votes
0answers
21views

Finding Contextual Synonyms that are not necessarily Grammatical Synonyms

I'm trying to learn if there is a way to utilize ML to find out a list of contextual synonyms for a word in a sentence. I know of some obvious ones where you mask the word and have some model predict ...
sharkeater123's user avatar
-1votes
1answer
64views

Where is machine learning leading to? [closed]

I was looking at the progress of the more popular LLMs in the last few years and wondering whether in the near future what will happen is that through the use of semi-exhaustive methods only those ...
GEP's user avatar
  • 109
2votes
2answers
211views

Is Llama3 fully open-source, including tokenizer, transformers, and other components needed to build a custom LLM?

I'm trying to understand whether Llama 3 (or other open source models) is fully open-source. Specifically, I would like to know: Is the source code for Llama 3 (including the tokenizer, transformers, ...
mlibre's user avatar
5votes
2answers
297views

Are the model implementations in Hugging Face’s transformers library created by the original model authors or by Hugging Face?

I've been exploring the implementation of models like Llama in Hugging Face’s transformers library, for example: Hugging Face's Llama model implementation. I’m ...
mlibre's user avatar
4votes
1answer
168views

In the Manifold Hypothesis applied to LLMs, are text sequences points or paths on the manifold?

The Manifold Hypothesis makes a ton of sense to me for images. Images are points in high dimensional space, where each dimension corresponds to the intensity value of a single pixel. For example, we ...
Stephen W.'s user avatar
3votes
1answer
1kviews

why we use learnable positional encoding instead of Sinusoidal positional encoding

In the original paper of transformers they using positional encoding to capture the position of each word in the sentence and for calculate that it using sin and cos ,like shom in the image. In Bert ...
LAILA EL OUEDEGHYRY's user avatar
1vote
2answers
889views

How the Q,K,V be calculated in multi-head attention

I want to understand the transformer architecture, so I start with self attention and I understand their mechanism, but when I pass to the multi-head attention I find some difficulties like how ...
LAILA EL OUEDEGHYRY's user avatar
4votes
2answers
1kviews

Why different noise in GAN generate different images?

I understand that noise $z$ serves as the input to the generator. Noise $z$ is essentially a vector of random numbers, typically from Gaussian distribution with chosen size of like $100$. However, I ...
user avatar
1vote
1answer
140views

Fine tuning or just feature extraction or both using Roberta?

I'm reading a program that use the pre-trained Roberta model (roberta-base). The code first extracts word embeddings from each caption in the batch, using the last hidden state of the Roberta model. ...
user avatar
2votes
2answers
1kviews

What technique is used for training Large Language Models like GPT?

I'm learning about GenAI, such as GPT (Generative Pretrained Transformer), and I'm particularly interested in understanding the training techniques used for these models. Deep learning, generally, can ...
Exploring's user avatar
0votes
0answers
60views

Understanding the concepts of embedding in Roberta architecture?

I'm reading the implementation file of Roberta architecture, specifically in the RobertaEmbedding class, this class has the comment: ...
user avatar
0votes
1answer
222views

how can I interpret attention weights matrix? Are they reliable?

I've fine-tuned two different models (Bert and Roberta) on a dataset for a binary classification task and I'm comparing the sentences where the models predict wrong. I decided to use attention weights ...
Shayan's user avatar
0votes
1answer
134views

Using naive bayesian vs. transformer-based architecture model for human-annotated data?

I have a reddit dataset with thousands of online posts over the economy and inflation. We have used human-annotation on 60% of posts to determine whether users blame the following entities over the ...
maldini1990's user avatar
2votes
1answer
136views

NLP "small" model to improve "big" model

When training the model for NLP is it important to get rid of data which has "bad semantic" for learning process? My plan is to create a "small model" which can decide whether data ...
Milkmaid's user avatar

153050per page
close